COLLECTIVE APPROACH FOR BAYESIAN NETWORK LEARNING FROM DISTRIBUTED HETEROGENEOUS DATABASE By
نویسندگان
چکیده
The members of the Committee appointed to examine the dissertation of RONG CHEN find it satisfactory and recommend that it be accepted. Chair ii ACKNOWLEDGMENT Above all, I would like to thank Dr. Krishnamoorthy Sivakumar, who for the last four and half years has been my advisor. Dr. Sivakumar guided me in the research work, gave me support when I need it the most, also taught me a lot how to become a successful and independent researcher. His discussion with me helped me to form the key component of this dissertation. Special thanks for Dr. Hillol Kargupta, who introduced me to Bayesian network and distributed data mining and supported my research work. I would also like to thank Dr. Benjamin Joseph Belzer and Dr. Thomas Fischer; their lectures helped me to build a solid research background. They also gave me the IRL, which is important to me. I would also like to thank my parents and my wife for the endless love and support they provided me. In this dissertation we concentrate on learning Bayesian Networks (BN) from distributed heterogeneous databases. We need to develop distributed techniques that save communication overhead, offer better scalability, and require minimal communication of possibly secure data. The objective of this work is to learn a collective BN from data that is distributed among geographically diverse sites. The data distribution is heterogeneous. The collective BN must be close to a BN learned by a centralized method and must require only a small amount of data transmission among different sites. In general, the collective learning algorithms have four steps: local learning, sample selection, cross learning, and combination. The key points in the proposed methods are: (1)use the BN decomposability property; (2)identify the samples that are most likely to be evidence of cross terms. We show that low-likelihood samples in each site are most likely to be the evidence of cross terms. One collective structure learning and two collective parameter learning methods iv are proposed. For structure learning, the collective method can find the correct structure of local variables by choosing a base structure learning algorithm with the decomposability property. Some extra links may be introduced due to the hidden variable problem. Sample selection chooses low-likelihood samples in local sites and transmits them to a central site. In cross learning, the structure of cross variables and cross set are identified. In combination, we add all cross links …
منابع مشابه
Learning Bayesian Network Structure from Distributed Data
We propose a collective method to address the problem of learning the structure of a Bayesian network from a distributed heterogeneous data sources. In this case, the dataset is distributed among several sites, with different features at each site. The collective method has four steps: local learning, sample selection, cross learning, and combination of the results. The parents of local nodes c...
متن کاملDisTriB: Distributed Trust Management Model Based on Gossip Learning and Bayesian Networks in Collaborative Computing Systems
The interactions among peers in Peer-to-Peer systems as a distributed collaborative system are based on asynchronous and unreliable communications. Trust is an essential and facilitating component in these interactions specially in such uncertain environments. Various attacks are possible due to large-scale nature and openness of these systems that affects the trust. Peers has not enough inform...
متن کاملDisTriB: Distributed Trust Management Model Based on Gossip Learning and Bayesian Networks in Collaborative Computing Systems
The interactions among peers in Peer-to-Peer systems as a distributed collaborative system are based on asynchronous and unreliable communications. Trust is an essential and facilitating component in these interactions specially in such uncertain environments. Various attacks are possible due to large-scale nature and openness of these systems that affects the trust. Peers has not enough inform...
متن کاملClustered Collaborative Filtering Approach for Distributed Data Mining on Electronic Health Records
Distributed Data Mining (DDM) has become one of the promising areas of Data Mining. DDM techniques include classifier approach and agent-approach. Classifier approach plays a vital role in mining distributed data, having homogeneous and heterogeneous approaches depend on data sites. Homogeneous classifier approach involves ensemble learning, distributed association rule mining, meta-learning an...
متن کاملAn approach to online Bayesian learning from multiple data streams
We present a collective approach to mine Bayesian networks from distributed heterogenous web-log data streams. In this approach we first learn a local Bayesian network at each site using the local data. Then each site identifies the observations that are most likely to be evidence of coupling between local and non-local variables and transmits a subset of these observations to a central site. A...
متن کامل